Roberto Alsina
BOXES (WIP Title)
Chapter 1
Introduction
This book tries to achieve only one thing:
Show you a project go from nothing to good.
By nothing I mean, no code at all. Not even a fleshed idea of what it does. No goals, no committments. Just a vague interest.
And by good I mean it will work, it will have tests, it will be available to use, it will be useful and be a real thing.
Think of it as a sort of documentary on the beginnings of a rock band, only instead of rockers there is a single overweight argentinian dev, and instead of a band there is a piece of software.
So, not much like a documentary on the beginnings of a rock band.
How it’s done
- It’s written in markdown
- The sections with code are fed to pyliterate and its output is built into a “book” by mdbook
- The code uses a ton of things, links are provided in the Dependencies Appendix
Chapter 2
Finger Thinking
This is the first part of the book, and the goal is starting a project from scratch. This happens in the beginning of many programming books, but I have this feeling that they are lying to me.
The code seems rehearsed, there are no errors, everything progresses monotonically towards a lovely pyramid of code with no false starts and no wrong assumptions.
In the decades I have programmed, that has never happened to me. Not once. So, I guess those “start from scratch” things are actually done backwards, from the working butterfly into a fake larva, or they are much better at this than I. Since I am not that awful a programmer, I have decided to believe the former, and take a shot at doing it the opposite way.
Everything you see here is in the order I wrote it. I have, occasionally, gone back and cleaned up a bad idea, but trust me, there are plenty still left.
***
Sometimes, when you start a project, you will find that you don’t have a clear goal. Some of my most fun projects were created because I was doing the equivalent of doodling on a computer.
Sometimes, it starts with a post on a blog, or a link on twitter, and I just think “hmmm, I wonder how that’s done” or “well, that sounds fun to write” or “that doesn’t look so hard!” (those usually are so hard.)
So, I doodle on the interpreter, or in a throwaway script file. No tests, no requirements, no fuzz.
The first part of this book is, then, an iterative doodle, showing the infancy of a project, and almost literally thinking with my fingers, and having them tell me where to go.
In later sections, things will get much more serious. So, enjoy this part.
Chapter 3
BOXES v1
Welcome to Boxes v1. I want to be able to draw some boxes. By boxes I don’t mean actual boxes, but rather squares. I found a library called svgwrite that lets you do that pretty easily.
First let’s create a data structure. A simple class called Box.
# lesson1.py class Box(): def __init__(self, x=0, y=0, w=1, h=1): """Accept arguments to define our box, and store them.""" self.x = x self.y = y self.w = w self.h = h def __repr__(self): return 'Box(%s, %s, %s, %s)' % (self.x, self.y, self.w, self.y)
As you can see that is a pretty simple class. And we can create a big box.
# lesson1.py big_box = Box(0, 0, 80, 100)
Or many boxes using a list comprehension
# lesson1.py many_boxes = [Box() for i in range(5000)]
So now we have a big box, and 5000 smaller boxes, all alike.
# Print the first 10 boxes print(many_boxes[:10])
[Box(0, 0, 1, 0), Box(0, 0, 1, 0), Box(0, 0, 1, 0), Box(0, 0, 1, 0), Box(0, 0, 1, 0), Box(0, 0, 1, 0), Box(0, 0, 1, 0), Box(0, 0, 1, 0), Box(0, 0, 1, 0), Box(0, 0, 1, 0)]
And yes, we can draw those boxes.
# lesson1.py import svgwrite def draw_boxes(boxes): dwg = svgwrite.Drawing('lesson1.svg', profile='full', size=(5, 2)) for box in boxes: dwg.add( dwg.rect( insert=(box.x, box.y), size=(box.w, box.h), fill='red' ) ) dwg.save() draw_boxes(many_boxes)
And here is the output:
That ... was not very interesting. It’s a single small red square!
Remember all our boxes have the same size and position!
So ... we should do something better. Or at least more interesting, in lesson 2.
Chapter 4
BOXES v2
In our previous lesson we created a rather disappointing drawing using boxes. Let’s introduce a new wrinkle, and layout the many boxes.
This code is just like before:
class Box(): def __init__(self, x=0, y=0, w=1, h=1): """We accept a few arguments to define our box, and we store them.""" self.x = x self.y = y self.w = w self.h = h def __repr__(self): """This is what is shown if we print a Box. We want it to be useful.""" return 'Box(%s, %s, %s, %s)' % (self.x, self.y, self.w, self.y) many_boxes = [Box() for i in range(5000)]
But now, so they are not all stuck one on top of the other, let’s lay the boxes down in a line, one next to the other.
# We add a "separation" constant so you can see the boxes individually separation = .2 def layout(boxes): for i, box in enumerate(boxes): box.x = i * (1 + separation) layout(many_boxes)
And we can now see that they all have different coordinates now by printing a few of them. And yes, some of those numbers do look funny. Floating point numbers are weird.
print([(box.x, box.y) for box in many_boxes[:10]])
[(0.0, 0), (1.2, 0), (2.4, 0), (3.5999999999999996, 0), (4.8, 0), (6.0, 0), (7.199999999999999, 0), (8.4, 0), (9.6, 0), (10.799999999999999, 0)]
Let’s draw them!
import svgwrite def draw_boxes(boxes): dwg = svgwrite.Drawing('lesson2.svg', profile='full', size=(100, 5)) for box in boxes: dwg.add(dwg.rect(insert=(box.x, box.y), size=(box.w, box.h), fill='red')) dwg.save() draw_boxes(many_boxes)
And here is the output:
That was more or less what we expected, right? Of course since there are 5000 small boxes that row of boxes goes on for quite a while.
We could just go to the right for a while, then start a new row. Let’s do that in the next lesson.
Chapter 5
BOXES v3
In our previous lesson we ended with something like a line of army ants, all our boxes lined up. Let’s make it better by making them organize themselves in rows.
This code is just like before:
class Box(): def __init__(self, x=0, y=0, w=1, h=1): """We accept a few arguments to define our box, and we store them.""" self.x = x self.y = y self.w = w self.h = h def __repr__(self): """This is what is shown if we print a Box. We want it to be useful.""" return 'Box(%s, %s, %s, %s)' % (self.x, self.y, self.w, self.y) many_boxes = [Box() for i in range(5000)]
But now, let’s organize our boxes in rank and file. In fact, let’s put our many boxes inside a big box.
big_box = Box(0,0,50,80)
We will get our boxes one at a time, put the first in 0,0 and the next one right at its right, and so on, and when we are about to step outside of the big box, we go back to the left, a little down, and do it all over again.
# We add a "separation" constant so you can see the boxes individually separation = .2 def layout(_boxes): # Because we modify the box list, we will work on a copy boxes = _boxes[:] # The 1st box is at 0,0 so no need to do anything with it, right? previous = boxes.pop(0) while boxes: # We take the new 1st box box = boxes.pop(0) # And put it next to the other box.x = previous.x + previous.w + separation # At the same vertical location box.y = previous.y # But if it's too far to the right... if (box.x + box.w) > big_box.w: # We go all the way left and a little down box.x = 0 box.y = previous.y + previous.h + separation previous = box layout(many_boxes)
And now we can draw it. Just so we are sure we are staying inside the big box, we will draw it too, in lightyellow.
import svgwrite def draw_boxes(boxes): dwg = svgwrite.Drawing('lesson3.svg', profile='full', size=(100, 100)) dwg.add(dwg.rect(insert=(big_box.x, big_box.y), size=(big_box.w, big_box.h), fill='lightyellow')) for box in boxes: dwg.add(dwg.rect(insert=(box.x, box.y), size=(box.w, box.h), fill='red')) dwg.save() draw_boxes(many_boxes)
And here is the output:
That is strangely satisfying! Of course we are doing something wrong in that we are overflowing the big box vertically.
So, we could have more than one big box. And use them as pages?
Chapter 6
BOXES v4
In the previous lesson we totally nailed drawing between the lines ... horizontally. Let’s improve on that by being bidimensional.
This code is just like before:
class Box(): def __init__(self, x=0, y=0, w=1, h=1): """We accept a few arguments to define our box, and we store them.""" self.x = x self.y = y self.w = w self.h = h def __repr__(self): """This is what is shown if we print a Box. We want it to be useful.""" return 'Box(%s, %s, %s, %s)' % (self.x, self.y, self.w, self.y) many_boxes = [Box() for i in range(5000)]
But now, instead of a big box, let’s have a list of, say, 10 pages (or large boxes), one below the other, slighty separated.
pages = [Box(0, i * 55, 30, 50) for i in range(10)]
Of course our layout routine needs improvements to handle overflowing a page vertically.
# We add a "separation" constant so you can see the boxes individually separation = .2 def layout(_boxes): # Because we modify the box list, we will work on a copy boxes = _boxes[:] # We start at page 0 page = 0 # The 1st box should be placed in the correct page previous = boxes.pop(0) previous.x = pages[page].x previous.y = pages[page].y while boxes: # We take the new 1st box box = boxes.pop(0) # And put it next to the other box.x = previous.x + previous.w + separation # At the same vertical location box.y = previous.y # But if it's too far to the right... if (box.x + box.w) > pages[page].x + pages[page].w: # We go all the way left and a little down box.x = pages[page].x box.y = previous.y + previous.h + separation # But if we go too far down if box.y + box.h > pages[page].y + pages[page].h: # We go to the next page page += 1 # And put the box at the top-left box.x = pages[page].x box.y = pages[page].y previous = box layout(many_boxes)
And we need to change our drawing code to draw more than one page. Also, because we will run it more than once, I added an argument to choose the name of the output file.
import svgwrite def draw_boxes(boxes, name='lesson4.svg'): dwg = svgwrite.Drawing(name, profile='full', size=(100, 60)) for page in pages: dwg.add(dwg.rect(insert=(page.x, page.y), size=(page.w, page.h), fill='yellow')) for box in boxes: dwg.add(dwg.rect(insert=(box.x, box.y), size=(box.w, box.h), fill='red')) dwg.save() draw_boxes(many_boxes)
And here is the output:
<img src=“lesson4.svg” width=“100%” style=‘border: 1px solid green; overflow: auto;’>
Would this work if the pages are arranged differently? Let’s put the pages side by side instead.
pages = [Box(i * 35, 0, 30, 50) for i in range(10)] layout(many_boxes) draw_boxes(many_boxes, 'lesson4_side_by_side.svg')
<img src=“lesson4_side_by_side.svg” width=“100%” style=‘border: 1px solid green; overflow: auto;’>
And how about pages of different sizes?
from random import randint pages = [Box(i * 35, 0, 30 + randint(-3,3), 50 + randint(-10, 10)) for i in range(10)] layout(many_boxes) draw_boxes(many_boxes, 'lesson4_random_sizes.svg')
<img src=“lesson4_random_sizes.svg” width=“100%” style=‘border: 1px solid green; overflow: auto;’>
So, we can fill pages and pages with little red squares now. Nice!
How about we make the squares not be all the same width?
many_boxes = [Box(w=1 + randint(-5,5)/10) for i in range(5000)] layout(many_boxes) draw_boxes(many_boxes, 'lesson4_random_box_sizes.svg')
This adds “noise” to the width of the boxes, so they are now anything between 0.5 and 1.5 units wide.
That looks interesting...
Chapter 7
BOXES v5
In our previous lesson we created code that can fill a series of pages using many small boxes.
But only when those boxes are all alike. Once the boxes had different widths, the right side of our layout got all ragged. You surely have seen things like it when using word processors or reading web pages, where the text is all aligned on the left and all ragged on the right. It’s called a “left-aligned” or “ragged-right” layout.
The reason is that we exceed the page width by a random amount, and then, when moving that box to the next row, we are left a random amount short of the desired width.
Could we make it look aligned on BOTH sides? Of course. Let’s try.
This code is just like before:
class Box(): def __init__(self, x=0, y=0, w=1, h=1): """We accept a few arguments to define our box, and we store them.""" self.x = x self.y = y self.w = w self.h = h def __repr__(self): """This is what is shown if we print a Box. We want it to be useful.""" return 'Box(%s, %s, %s, %s)' % (self.x, self.y, self.w, self.y) # Many boxes with varying widths from random import randint many_boxes = [Box(w=1 + randint(-5,5)/10) for i in range(5000)] # A few pages all the same size pages = [Box(i * 35, 5, 30, 50) for i in range(10)]
And of course, we need a new layout function. The plan is this:
- Organize boxes in rows, like before.
- When we are about to go too wide, see how much “slack” is left between the right side of our last box in the row and the edge of the page.
- Spread that slack by sliding all boxes slightly right so noone notices.
# We add a "separation" constant so you can see the boxes individually separation = .2 def layout(_boxes): # Because we modify the box list, we will work on a copy boxes = _boxes[:] # We start at page 0 page = 0 # The 1st box should be placed in the correct page previous = boxes.pop(0) previous.x = pages[page].x previous.y = pages[page].y row = [] while boxes: # We take the new 1st box box = boxes.pop(0) # And put it next to the other box.x = previous.x + previous.w + separation # At the same vertical location box.y = previous.y # But if it's too far to the right... if (box.x + box.w) > (pages[page].x + pages[page].w): # We adjust the row slack = (pages[page].x + pages[page].w) - (row[-1].x + row[-1].w) bump = slack / len(row) # The 1st box gets 0 bumps, the 2nd gets 1 and so on for i, b in enumerate(row): b.x += bump * i # We start a new row row = [] # We go all the way left and a little down box.x = pages[page].x box.y = previous.y + previous.h + separation # But if we go too far down if box.y + box.h > pages[page].y + pages[page].h: # We go to the next page page += 1 # And put the box at the top-left box.x = pages[page].x box.y = pages[page].y # Put the box in the row row.append(box) previous = box layout(many_boxes)
The drawing code needs no changes.
import svgwrite def draw_boxes(boxes, name='lesson5.svg'): dwg = svgwrite.Drawing(name, profile='full', size=(150, 60)) for page in pages: dwg.add(dwg.rect(insert=(page.x, page.y), size=(page.w, page.h), fill='lightyellow')) for box in boxes: dwg.add(dwg.rect(insert=(box.x, box.y), size=(box.w, box.h), fill='red')) dwg.save() draw_boxes(many_boxes)
Isn’t that nice? If you look at it from afar it looks sort of familiar. Doesn’t it?
Chapter 8
BOXES v6
In our previous lesson we created a fully justified layout of varying-width boxes spread across multiple pages. But we cheated.
To achieve full justification, we spread the “slack” evenly in the space between all boxes in the row. If we were trying to layout text, that is not the proper way.
You see, text comes separated in words. And usually, in western languages, the words have characters called spaces between them. So what we do, when laying out text, is to make the special space boxes slightly larger and keep the separation between boxes constant (in fact, we also tweak separations between letters, but let’s ignore that for now. Or for ever.
How about we choose some boxes and decide they, and only they, are stretchy?
That way, our strategy to fully justify the text will be: stretch the stretchy bits on each row just enough so that the row is exactly the width we need.
For the first time in a few lessons, we need to change our Box class:
class Box(): def __init__(self, x=0, y=0, w=1, h=1, stretchy=False): """We accept a few arguments to define our box, and we store them.""" self.x = x self.y = y self.w = w self.h = h self.stretchy = stretchy def __repr__(self): """This is what is shown if we print a Box. We want it to be useful.""" return 'Box(%s, %s, %s, %s)' % (self.x, self.y, self.w, self.y) # Many boxes with varying widths, and about 1 in 10 will be stretchy from random import randint many_boxes = [Box(w=1 + randint(-5,5)/10, stretchy=(randint(0,5) == 4)) for i in range(5000)] # A few pages all the same size pages = [Box(i * 35, 5, 30, 50) for i in range(10)]
The changes in the layout function are not so big.
# We add a "separation" constant so you can see the boxes individually separation = .2 def layout(_boxes): # Because we modify the box list, we will work on a copy boxes = _boxes[:] # We start at page 0 page = 0 # The 1st box should be placed in the correct page previous = boxes.pop(0) previous.x = pages[page].x previous.y = pages[page].y row = [] while boxes: # We take the new 1st box box = boxes.pop(0) # And put it next to the other box.x = previous.x + previous.w + separation # At the same vertical location box.y = previous.y # But if it's too far to the right... if (box.x + box.w) > (pages[page].x + pages[page].w): # We adjust the row slack = (pages[page].x + pages[page].w) - (row[-1].x + row[-1].w) stretchies = [b for b in row if b.stretchy] if stretchies: bump = slack / len(stretchies) # Each stretchy gets wider for b in stretchies: b.w += bump # And we put each thing next to the previous one for j, b in enumerate(row[1:], 1): b.x = row[j-1].x + row[j-1].w + separation else: # Nothing stretches!!! Do it like before. bump = slack / len(row) for i, b in enumerate(row): b.x += bump * i # We start a new row row = [] # We go all the way left and a little down box.x = pages[page].x box.y = previous.y + previous.h + separation # But if we go too far down if box.y + box.h > pages[page].y + pages[page].h: # We go to the next page page += 1 # And put the box at the top-left box.x = pages[page].x box.y = pages[page].y # Put the box in the row row.append(box) previous = box layout(many_boxes)
The drawing code needs a change so we can see the “stretchy” boxes in a different color.
import svgwrite def draw_boxes(boxes, name='lesson6.svg'): dwg = svgwrite.Drawing(name, profile='full', size=(100, 60)) for page in pages: dwg.add(dwg.rect(insert=(page.x, page.y), size=(page.w, page.h), fill='lightyellow')) for box in boxes: color = 'green' if box.stretchy else 'red' dwg.add(dwg.rect(insert=(box.x, box.y), size=(box.w, box.h), fill=color)) dwg.save() draw_boxes(many_boxes)
This layout strategy works:
- With multiple pages of arbitrary sizes and positions
- With many boxes of different widths and stretch capabilities
- Even if nothing can stretch
But the next lesson will start taking things to the next level.
Chapter 9
BOXES v7
So far in our previous lessons we have worked in an abstract world of boxes. Some hints of a direction were visible, like organizing our boxes in pages and trying to achieve a justified layout among others.
So, let’s just say it, we are going to be doing text layout. But not the easy one. No, sir. No monospaced fonts for us. We want to do the whole enchilada, we are going to have variable-width fonts with kerning, and multi-page, fully-justified text layouts with hyphenation.
Ok, perhaps about 50% of the enchilada, because no bidirectional support, only in english, only UTF-8 encoded files, and so on a lot of things. But it’s still a lot of mexican food!
And we are going to do that in lessons not much longer than the ones you have been seeing so far. So let’s get started.
Clearly, we want our boxes to have letters. And our “stretchy” boxes are special because they have things like spaces. In fact, let’s just say they have spaces.
We will now expand our Box class to support letters inside the boxes.
class Box(): def __init__(self, x=0, y=0, w=1, h=1, stretchy=False, letter='x'): """We accept a few arguments to define our box, and we store them.""" self.x = x self.y = y self.w = w self.h = h self.stretchy = stretchy self.letter = letter def __repr__(self): """This is what is shown if we print a Box. We want it to be useful.""" return 'Box(%s, %s, %s, %s, "%s")' % (self.x, self.y, self.w, self.y, self.letter) # A few pages all the same size pages = [Box(i * 35 + 1, 1, 30, 50) for i in range(10)] # Many boxes, all the same width, with an x in them text_boxes = [Box() for i in range(5000)]
We can actually use the exact same layout function.
# We add a "separation" constant so you can see the boxes individually separation = .2 def layout(_boxes): # Because we modify the box list, we will work on a copy boxes = _boxes[:] # We start at page 0 page = 0 # The 1st box should be placed in the correct page previous = boxes.pop(0) previous.x = pages[page].x previous.y = pages[page].y row = [] while boxes: # We take the new 1st box box = boxes.pop(0) # And put it next to the other box.x = previous.x + previous.w + separation # At the same vertical location box.y = previous.y # But if it's too far to the right... if (box.x + box.w) > (pages[page].x + pages[page].w): # We adjust the row slack = (pages[page].x + pages[page].w) - (row[-1].x + row[-1].w) stretchies = [b for b in row if b.stretchy] if stretchies: bump = slack / len(stretchies) # Each stretchy gets wider for b in stretchies: b.w += bump # And we put each thing next to the previous one for j, b in enumerate(row[1:], 1): b.x = row[j-1].x + row[j-1].w + separation else: # Nothing stretches!!! Do it like before. bump = slack / len(row) for i, b in enumerate(row): b.x += bump * i # We start a new row row = [] # We go all the way left and a little down box.x = pages[page].x box.y = previous.y + previous.h + separation # But if we go too far down if box.y + box.h > pages[page].y + pages[page].h: # We go to the next page page += 1 # And put the box at the top-left box.x = pages[page].x box.y = pages[page].y # Put the box in the row row.append(box) previous = box layout(text_boxes)
And tweak the drawing function to show us letters, and to make the colored boxes optional.
import svgwrite def draw_boxes(boxes, name='lesson7.svg', hide_boxes=False): dwg = svgwrite.Drawing(name, profile='full', size=(32, 20)) for page in pages: dwg.add(dwg.rect(insert=(page.x, page.y), size=(page.w, page.h), fill='lightyellow')) for box in boxes: color = 'green' if box.stretchy else 'red' if not hide_boxes: dwg.add(dwg.rect(insert=(box.x, box.y), size=(box.w, box.h), fill=color)) if box.letter: dwg.add(dwg.text(box.letter, insert=(box.x, box.y + box.h), font_size=box.h, font_family='Arial')) dwg.save() draw_boxes(text_boxes)
Of course this is very boring, so we need to spice up our data a little. We can use different letters, and then make the right ones stretchy. That is easy!
from random import choice for box in text_boxes: # More than one space so they appear often box.letter = choice(' abcdefghijklmnopqrstuvwxyz') if box.letter == ' ': # Spaces are stretchy box.stretchy = True layout(text_boxes) draw_boxes(text_boxes, 'lesson7_different_letters.svg')
As you can see, there are very minor horizontal shifts and stretches, since all boxes are the same size.
But as a text layout engine we have a major failure: we are ignoring the size of the letters we are layouting!
This is a very complex thing to do called text shaping. You need to understand the content of the font you are using to display the text, and more subtle things like what happens if you put specific letters next to each other (kerning) and much more.
The good news is that it’s already done for us, in libraries called Harfbuzz and Freetype.
This paragraph is perhaps the most important one in this book. I am about to show you some obscure code. And I will tell you the secret of how it got here: I copied it from the documentation for the libraries I am using. Sometimes you will need to do something complicated only once in your life. It’s perfectly ok to just google how to do it. And as long as you are confident you can find it again if needed, it’s ok to just forget about it.
I will show you this code, and then put it in a separate file called fonts.py and from now on I will not show it in the lessons, because we are not going to change it, ever.
import harfbuzz as hb import freetype2 as ft def adjust_widths_by_letter(boxes): """Takes a list of boxes as arguments, and uses harfbuzz to adjust the width of each box to match the harfbuzz text shaping.""" buf = hb.Buffer.create() buf.add_str(''.join(b.letter for b in boxes)) buf.guess_segment_properties() font_lib = ft.get_default_lib() face = font_lib.find_face('Arial') face.set_char_size(size = 1, resolution=64) font = hb.Font.ft_create(face) hb.shape(font, buf) # at this point buf.glyph_positions has all the data we need for box, position in zip(boxes, buf.glyph_positions): box.w = position.x_advance
And now we will pretend we know what that does, based on its docstring and use it.
separation = .05 adjust_widths_by_letter(text_boxes) layout(text_boxes) draw_boxes(text_boxes, 'lesson7_adjusted_letters.svg')
And nicer, without the boxes:
separation = .05 adjust_widths_by_letter(text_boxes) layout(text_boxes) draw_boxes(text_boxes, 'lesson7_adjusted_letters_no_boxes.svg', hide_boxes=True)
And of course, we can just load text there instead of random letters. For example, here we load what is going to be our example test from now on, Jane Austen’s Pride and Prejudice from Project Gutenberg
separation = .05 p_and_p = open('pride-and-prejudice.txt').read() text_boxes = [] for l in p_and_p: text_boxes.append(Box(letter=l, stretchy=l==' ')) adjust_widths_by_letter(text_boxes) layout(text_boxes) draw_boxes(text_boxes, 'lesson7_pride_and_prejudice.svg', hide_boxes=True)
And that is ... maybe disappointing? While we spent a lot of time on things like justifying text, we have not even looked at newlines!
Also, spaces at the end of lines make the line appear ragged again, now that they are not boxes.
So, we know what to hit in the next lesson.
Chapter 10
BOXES v8
In the previous lesson we started using our layout engine to display text, and ran into some limitations. Let’s get rid of them.
We have no changes in our Box class, or the page setup, or how we load and adjust the boxes’ sizes. Also unchanged is the drawing code.
from fonts import adjust_widths_by_letter class Box(): def __init__(self, x=0, y=0, w=1, h=1, stretchy=False, letter='x'): """We accept a few arguments to define our box, and we store them.""" self.x = x self.y = y self.w = w self.h = h self.stretchy = stretchy self.letter = letter def __repr__(self): """This is what is shown if we print a Box. We want it to be useful.""" return 'Box(%s, %s, %s, %s, "%s")' % (self.x, self.y, self.w, self.y, self.letter) # A few pages all the same size pages = [Box(i * 35 + 1, 1, 30, 50) for i in range(10)] separation = .05 p_and_p = open('pride-and-prejudice.txt').read() text_boxes = [] for l in p_and_p: text_boxes.append(Box(letter=l, stretchy=l==' ')) adjust_widths_by_letter(text_boxes)
import svgwrite def draw_boxes(boxes, name='lesson8.svg', hide_boxes=False): dwg = svgwrite.Drawing(name, profile='full', size=(32, 52)) for page in pages: dwg.add(dwg.rect(insert=(page.x, page.y), size=(page.w, page.h), fill='lightyellow')) for box in boxes: color = 'green' if box.stretchy else 'red' if not hide_boxes: dwg.add(dwg.rect(insert=(box.x, box.y), size=(box.w, box.h), fill=color)) if box.letter: dwg.add(dwg.text(box.letter, insert=(box.x, box.y + box.h), font_size=box.h, font_family='Arial')) dwg.save()
But we need to work on our layout engine, a lot. Here is the image of our attempt at displaying “Pride and Prejudice”:
Let’s count the problems:
- It totally ignores newlines everywhere
- It keeps spaces at the end of rows, making the right side ragged (see “said his ” in the seventh line)
- White space at the beginning of rows is shown and it looks bad (see " a neigh" at the beginning of the fifth line)
- Words are split between lines haphazardly, but this is for later and leads to some serious code that needs its own lesson.
In this section we will do things slightly different than before, by doing incremental improvements of the layout function, so this is going to be pretty long but with small changes.
Let’s hit the issues in order.
Newlines
The idea is: if we find a newline, we need to break the line. Doesn’t sound particularly complex, specially since lines that are broken intentionally are never fully justified.
The changes are minor:
- Create a flag
break_lineset to True if we encounter a newline or overflow the page. - In case of newline, make that box invisible by making it 0-wide and not stretchy.
- When the break_line flag is set, handle as usual by moving to the left, etc.
# We add a "separation" constant so you can see the boxes individually separation = .05 def layout(_boxes): # Because we modify the box list, we will work on a copy boxes = _boxes[:] # We start at page 0 page = 0 # The 1st box should be placed in the correct page previous = boxes.pop(0) previous.x = pages[page].x previous.y = pages[page].y row = [] while boxes: # We take the new 1st box box = boxes.pop(0) # And put it next to the other box.x = previous.x + previous.w + separation # At the same vertical location box.y = previous.y # The next 10 lines are almost all the change break_line = False # But if it's a newline if (box.letter == '\n'): break_line = True # Newlines take no horizontal space ever box.w = 0 box.stretchy = False # Or if it's too far to the right... elif (box.x + box.w) > (pages[page].x + pages[page].w): break_line = True # We adjust the row slack = (pages[page].x + pages[page].w) - (row[-1].x + row[-1].w) stretchies = [b for b in row if b.stretchy] if stretchies: bump = slack / len(stretchies) # Each stretchy gets wider for b in stretchies: b.w += bump # And we put each thing next to the previous one for j, b in enumerate(row[1:], 1): b.x = row[j-1].x + row[j-1].w + separation else: # Nothing stretches!!! Do it like before. bump = slack / len(row) for i, b in enumerate(row): b.x += bump * i if break_line: # We start a new row row = [] # We go all the way left and a little down box.x = pages[page].x box.y = previous.y + previous.h + separation # But if we go too far down if box.y + box.h > pages[page].y + pages[page].h: # We go to the next page page += 1 # And put the box at the top-left box.x = pages[page].x box.y = pages[page].y # Put the box in the row row.append(box) previous = box layout(text_boxes) draw_boxes(text_boxes, 'lesson8_handle_newlines.svg', hide_boxes=True)
As mentioned, the code changes are small, but the output now looks radically different.
Spaces against the right and left margins
You can see clearly, in the previous sample output where this happens in one of the latter paragraphs, “to see the place, ” appears ragged when it should not. And a similar thing happens in an earlier paragraph where there is a hole against the left margin in " told me all about it".
In both cases, the cause is because the “empty” space is used by spaces!
So, one possible solution is, when justifying a row, to make all the spaces at the right margins 0-width and not stretchy. At the same time, when adding spaces at the beginning of a row, they should become 0-width and not stretchy.
BUT this means the list of boxes will need its width readjusted if they are to be layouted again on different pages! That’s because some of the spaces will now be thin and “rigid” so they will work badly if they are not against the margin on a different layout.
It’s not a big problem, but it’s worth keeping in mind, since it’s the kind of thing that becomes an obscure bug later on. So, we add it to the docstring.
# We add a "separation" constant so you can see the boxes individually separation = .05 def layout(_boxes): """Layout boxes along pages. Keep in mind that this function modifies the boxes themselves, so you should be very careful about trying to call layout() more than once on the same boxes. Specifically, some spaces will become 0-width and not stretchy. """ # Because we modify the box list, we will work on a copy boxes = _boxes[:] # We start at page 0 page = 0 # The 1st box should be placed in the correct page previous = boxes.pop(0) previous.x = pages[page].x previous.y = pages[page].y row = [] while boxes: # We take the new 1st box box = boxes.pop(0) # And put it next to the other box.x = previous.x + previous.w + separation # At the same vertical location box.y = previous.y # The next 10 lines are almost all the change break_line = False # But if it's a newline if (box.letter == '\n'): break_line = True # Newlines take no horizontal space ever box.w = 0 box.stretchy = False # Or if it's too far to the right... elif (box.x + box.w) > (pages[page].x + pages[page].w): break_line = True # We adjust the row # Remove all right-margin spaces while row[-1].letter == ' ': row.pop() slack = (pages[page].x + pages[page].w) - (row[-1].x + row[-1].w) stretchies = [b for b in row if b.stretchy] if stretchies: bump = slack / len(stretchies) # Each stretchy gets wider for b in stretchies: b.w += bump # And we put each thing next to the previous one for j, b in enumerate(row[1:], 1): b.x = row[j-1].x + row[j-1].w + separation else: # Nothing stretches!!! Do it like before. bump = slack / len(row) for i, b in enumerate(row): b.x += bump * i if break_line: # We start a new row row = [] # We go all the way left and a little down box.x = pages[page].x box.y = previous.y + previous.h + separation # But if we go too far down if box.y + box.h > pages[page].y + pages[page].h: # We go to the next page page += 1 # And put the box at the top-left box.x = pages[page].x box.y = pages[page].y # Put the box in the row row.append(box) # Collapse all left-margin space if all(b.letter == ' ' for b in row): box.w = 0 box.stretchy = False box.x = pages[page].x previous = box layout(text_boxes) draw_boxes(text_boxes, 'lesson8_handle_spaces.svg', hide_boxes=True)
As you can see, the justification now is absolutely tight where it needs to be. With that taken care of, we will keep hyphenation for the next lesson.
Chapter 11
BOXES v9
In our previous lesson we created a serviceable text layout engine. It has many problems, but remember our goal is not to create the best possible thing, this is an educational experience. The spit and polish will appear later on.
But there is a glaring problem, it breaks words in all the wrong places. Examples of it appear in almost every line of the output. So, how does one fix that?
The traditional answer (and the one we will be using) is hyphenation, breaking words between lines in the correct places.
Instead of breaking anywhere, we will break only in the places where the rules of each language allow us to.
Just as it happened with text shaping we are lucky to live in a moment in time when almost everything we need to do it right is already in place. In particular, we will use a library called Pyphen mostly because I already have used it in another project.
Am I sure it’s the best one? No. Do I know exactly how it does what it does? No. I know enough to make it work, and it works well enough so for this stage in the life of this project that is more than enough. In fact, it takes the rules for word-breaking from dictionaries provided by an Office Suite, so it does about as good a job as the dictionary does. It even supports subtleties such as the differences betwen British and American English!
Here’s an example of how it works:
import pyphen dic = pyphen.Pyphen(lang='en_GB') print('en_GB:', dic.inserted('dictionary', '-')) dic = pyphen.Pyphen(lang='en_US') print('en_US:', dic.inserted('dictionary', '-'))
en_GB: dic-tion-ary
en_US: dic-tio-nary
Keep in mind that this is not magic. If you feed it garbage, it will give you garbage.
dic = pyphen.Pyphen(lang='es_ES') print('es_ES:', dic.inserted('dictionary', '-'))
es_ES: dic-tio-na-ry
Where is it proper to break a line?
- On a newline character
- On a space
- On a breaking point as defined by Pyphen
One of those things is not like the others. We have boxes with newlines in them and we have boxes with spaces in them, but there are no boxes with breaking points in them.
But we can add them! There is unicode symbol for that: SOFT HYPHEN (SHY)
It serves as an invisible marker used to specify a place in text where a hyphenated break is allowed without forcing a line break in an inconvenient place if the text is re-flowed. It becomes visible only after word wrapping at the end of a line.
So, if we insert them in all the right places, then we can use them to decide whether we are at a suitable breaking point.
dic = pyphen.Pyphen(lang='en_US') # '\xad' is the Soft Hyphen (SHY) character def insert_soft_hyphens(text, hyphen='\xad'): """Insert the hyphen in breaking pointsaccording to the dictionary.""" lines = [] for line in text.splitlines(): hyph_words = [dic.inserted(word, hyphen) for word in line.split()] lines.append(' '.join(hyph_words)) return '\n'.join(lines) print (insert_soft_hyphens('Roses are red\nViolets are blue', '-'))
Ros-es are red
Vi-o-lets are blue
So, with this code ready, we can get to work on implementing hyphenation support in our layout function.
First, this code is exactly as it was before:
from fonts import adjust_widths_by_letter class Box(): def __init__(self, x=0, y=0, w=1, h=1, stretchy=False, letter='x'): """We accept a few arguments to define our box, and we store them.""" self.x = x self.y = y self.w = w self.h = h self.stretchy = stretchy self.letter = letter def __repr__(self): """This is what is shown if we print a Box. We want it to be useful.""" return 'Box(%s, %s, %s, %s, "%s")' % (self.x, self.y, self.w, self.y, self.letter) # A few pages all the same size pages = [Box(i * 35 + 1, 1, 30, 50) for i in range(10)] import svgwrite def draw_boxes(boxes, name='lesson9.svg', hide_boxes=False): dwg = svgwrite.Drawing(name, profile='full', size=(32, 22)) for page in pages: dwg.add(dwg.rect(insert=(page.x, page.y), size=(page.w, page.h), fill='lightyellow')) for box in boxes: color = 'green' if box.stretchy else 'red' if not hide_boxes: dwg.add(dwg.rect(insert=(box.x, box.y), size=(box.w, box.h), fill=color)) if box.letter: dwg.add(dwg.text(box.letter, insert=(box.x, box.y + box.h), font_size=box.h, font_family='Arial')) dwg.save()
We do need to make a small change to how we load our text, to add the hyphens:
p_and_p = open('pride-and-prejudice.txt').read() p_and_p = insert_soft_hyphens(p_and_p) # This is the new line text_boxes = [] for l in p_and_p: text_boxes.append(Box(letter=l, stretchy=l==' ')) adjust_widths_by_letter(text_boxes)
And now our layout function. One first approach, which we will refine later, is to simply refuse to break lines if we are not in a “good” place to break it.
Then, we inject a box with a visible hyphen in the linebreak, and that’s it.
Here is the code to create a box with a hyphen:
def hyphenbox(): b = Box(letter='-') adjust_widths_by_letter([b]) return b
And here finally, our layout supports hyphens:
# We add a "separation" constant so you can see the boxes individually separation = .05 def layout(_boxes): """Layout boxes along pages. Keep in mind that this function modifies the boxes themselves, so you should be very careful about trying to call layout() more than once on the same boxes. Specifically, some spaces will become 0-width and not stretchy. """ # Because we modify the box list, we will work on a copy boxes = _boxes[:] # We start at page 0 page = 0 # The 1st box should be placed in the correct page previous = boxes.pop(0) previous.x = pages[page].x previous.y = pages[page].y row = [] while boxes: # We take the new 1st box box = boxes.pop(0) # And put it next to the other box.x = previous.x + previous.w + separation # At the same vertical location box.y = previous.y # The next 10 lines are almost all the change break_line = False # But if it's a newline if (box.letter == '\n'): break_line = True # Newlines take no horizontal space ever box.w = 0 box.stretchy = False # Or if it's too far to the right... elif (box.x + box.w) > (pages[page].x + pages[page].w) and box.letter in (' ', '\xad'): if box.letter == '\xad': # Add a visible hyphen in the row h_b = hyphenbox() h_b.x = previous.x + previous.w + separation h_b.y = previous.y _boxes.append(h_b) # So it's drawn row.append(h_b) # So it's justified break_line = True # We adjust the row # Remove all right-margin spaces while row[-1].letter == ' ': row.pop() slack = (pages[page].x + pages[page].w) - (row[-1].x + row[-1].w) stretchies = [b for b in row if b.stretchy] if stretchies: bump = slack / len(stretchies) # Each stretchy gets wider for b in stretchies: b.w += bump # And we put each thing next to the previous one for j, b in enumerate(row[1:], 1): b.x = row[j-1].x + row[j-1].w + separation else: # Nothing stretches!!! Do it like before. bump = slack / len(row) for i, b in enumerate(row): b.x += bump * i if break_line: # We start a new row row = [] # We go all the way left and a little down box.x = pages[page].x box.y = previous.y + previous.h + separation # But if we go too far down if box.y + box.h > pages[page].y + pages[page].h: # We go to the next page page += 1 # And put the box at the top-left box.x = pages[page].x box.y = pages[page].y # Put the box in the row row.append(box) # Collapse all left-margin space if all(b.letter == ' ' for b in row): box.w = 0 box.stretchy = False box.x = pages[page].x previous = box layout(text_boxes) draw_boxes(text_boxes, hide_boxes=True)
And there in “proper-ty” you can see it in action. Of course this is a naïve implementation. What happens if you just can’t break?
many_boxes = [Box(letter='a') for i in range(200)] adjust_widths_by_letter(many_boxes) layout(many_boxes) draw_boxes(many_boxes, hide_boxes=True, name='lesson9_lots_of_a.svg')
Since it can’t break at all, it just goes on and on.
And there are other corner cases!
many_boxes = [Box(letter='a') for i in range(200)] many_boxes[100] = Box(letter=' ', stretchy=True) adjust_widths_by_letter(many_boxes) layout(many_boxes) draw_boxes(many_boxes, hide_boxes=True, name='lesson9_one_break.svg')
Because there is only one place to break the line, it then tries to wedge 100 letter “a” where there is room for 54 (I counted!) and something interesting happens... the “slack” is negative!
Instead of stretching out a “underfilled” line, we are squeezing a “overfilled” one. Everything gets packed too tight, and the letters start overlapping one another.
The lesson is that just because it works for the usual case it doesn’t mean it’s done. Even in the case of words, it can happen that breaking points take a while to appear and our line becomes overfull.
We will tackle that problem next.
Chapter 12
BOXES v10
Chapter 13
Dependencies
(add list of things used in the code with references)